CADDeLaG: Framework for distributed anomaly detection in large dense graph sequences

نویسندگان

  • Aniruddha Basak
  • Kamalika Das
  • Ole J. Mengshoel
چکیده

Random walk based distance measures for graphs such as commutetime distance are useful in a variety of graph algorithms, such as clustering, anomaly detection, and creating low dimensional embeddings. Since such measures hinge on the spectral decomposition of the graph, the computation becomes a bottleneck for large graphs and do not scale easily to graphs that cannot be loaded in memory. Most existing graph mining libraries for large graphs either resort to sampling or exploit the sparsity structure of such graphs for spectral analysis. However, such methods do not work for dense graphs constructed for studying pairwise relationships among entities in a data set. Examples of such studies include analyzing pairwise locations in gridded climate data for discovering long distance climate phenomena. These graphs representations are fully connected by construction and cannot be sparsified without loss of meaningful information. In this paper we describe CADDeLaG, a framework for scalable computation of commute-time distance based anomaly detection in large dense graphs without the need to load the entire graph in memory. The framework relies on Apache Spark’s memory-centric cluster-computing infrastructure and consists of two building blocks: a decomposable algorithm for commute time distance computation and a distributed linear system solver. We illustrate the scalability of CADDeLaG and its dependency on various factors using both synthetic and real world data sets. We demonstrate the usefulness of CADDeLaG in identifying anomalies in a climate graph sequence, that have been historically missed due to ad-hoc graph sparsification and on an election donation data set. 1 ar X iv :1 80 2. 05 42 1v 1 [ cs .D C ] 1 5 Fe b 20 18 CADDeLaG: Framework for distributed anomaly detection in large dense graph sequences Aniruddha Basak∗1, Kamalika Das†2, and Ole J. Mengshoel‡3 Carnegie Mellon University NASA Ames Research Center

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System

In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...

متن کامل

Moving dispersion method for statistical anomaly detection in intrusion detection systems

A unified method for statistical anomaly detection in intrusion detection systems is theoretically introduced. It is based on estimating a dispersion measure of numerical or symbolic data on successive moving windows in time and finding the times when a relative change of the dispersion measure is significant. Appropriate dispersion measures, relative differences, moving windows, as well as tec...

متن کامل

NScale: Neighborhood-centric Analytics on Large Graphs

There is an increasing interest in executing rich and complex analysis tasks over large-scale graphs, many of which require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph. Examples of such tasks include ego network analysis, motif counting in biological networks, finding social circles, personalized recommendations, link prediction, anomaly de...

متن کامل

Network-Wide Anomaly Detection Based on Router Connection Relationships

Detecting distributed anomalies rapidly and accurately is critical for efficient backbone network management. In this letter, we propose a novel anomaly detection method that uses router connection relationships to detect distributed anomalies in the backbone Internet. The proposed method unveils the underlying relationships among abnormal traffic behavior through closed frequent graph mining, ...

متن کامل

A Hierarchical Framework for Classifying and Assessing Internet Traffic Anomalies

We present ALARM (HierarchicAL AppRoach for AnoMaly Detection), a hierarchical approach for correlation and prioritization of alerts in distributed networks. The goal is to monitor, classify, correlate and assess a large number of alerts generated at spatially distributed sites on the Internet. The alerts correspond to anomalies and hence potential unknown threats. To facilitate our analysis, w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.05421  شماره 

صفحات  -

تاریخ انتشار 2018